NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How effective is matrix reordering for improving performance of sparse matrix-vector multiplication?

https://doi.org/10.1145/3731599.3767441

Asudeh, Omid; Mahdipour_Saravani, Sina; Rastello, Fabrice; Sabin, Gerald; Sadayappan, Ponnuswamy (November 2025, ACM)

Free, publicly-accessible full text available November 15, 2026
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern

https://doi.org/10.1145/3626183.3659986

Eyraud-Dubois, Lionel; Iooss, Guillaume; Langou, Julien; Rastello, Fabrice (June 2024, ACM)

When designing an algorithm, one cares about arithmetic/compu- tational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a given I/O model, scheduling strategies such as loop tiling can reduce the required I/O down to a limit, called the I/O complexity, inherent to the algorithm itself. The objective of I/O complexity analysis is to compute, for a given program, its minimal I/O requirement among all valid schedules. We consider a sequential execution model with two memories, an infinite one, and a small one of size 𝑆 on which the computations retrieve and produce data. The I/O is the number of reads and writes between the two memories. We identify a common “hourglass pattern” in the dependency graphs of several common linear algebra kernels. Using the proper- ties of this pattern, we mathematically prove tighter lower bounds on their I/O complexity, which improves the previous state-of-the- art bound by a parametric ratio. This proof was integrated inside the IOLB automatic lower bound derivation tool.
more » « less
Full Text Available
PALMED: Throughput Characterization for Superscalar Architectures

https://doi.org/10.1109/CGO53902.2022.9741289

Derumigny, Nicolas; Bastian, Theophile; Gruber, Fabian; Iooss, Guillaume; Guillon, Christophe; Pouchet, Louis-Noel; Rastello, Fabrice (April 2022, 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO))

Full Text Available
IOOpt: automatic derivation of I/O complexity bounds for affine programs

https://doi.org/10.1145/3453483.3454103

Olivry, Auguste; Iooss, Guillaume; Tollenaere, Nicolas; Rountev, Atanas; Sadayappan, P.; Rastello, Fabrice (June 2021, 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation)
null (Ed.)
Full Text Available
Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures

https://doi.org/10.1109/SC41405.2020.00091

Kurt, Sureyya Emre; Sukumaran-Rajam, Aravind; Rastello, Fabrice; Sadayappan, P. (November 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
PolyBench/Python: benchmarking Python environments with polyhedral optimizations

https://doi.org/10.1145/3446804.3446842

Abella-González, Miguel Á.; Carollo-Fernández, Pedro; Pouchet, Louis-Noël; Rastello, Fabrice; Rodríguez, Gabriel (February 2021, CC 2021: 30th ACM SIGPLAN International Conference on Compiler Construction)
null (Ed.)
Python has become one of the most used and taught languages nowadays. Its expressiveness, cross-compatibility and ease of use have made it popular in areas as diverse as finance, bioinformatics or machine learning. However, Python programs are often significantly slower to execute than an equivalent native C implementation, especially for computation-intensive numerical kernels. This work presents PolyBench/Python, implementing the 30 kernels in PolyBench/C, one of the standard benchmark suites for polyhedral optimization, in Python. In addition to the benchmark kernels, a functional wrapper including mechanisms for performance measurement, testing, and execution configuration has been developed. The framework includes support for different ways to translate C-array codes into Python, offering insight into the tradeoffs of Python lists and NumPy arrays. The benchmark performance is thoroughly evaluated on different Python interpreters, and compared against its PolyBench/C counterpart to highlight the profitability (or lack thereof) of using Python for regular numerical codes.
more » « less
Full Text Available
Automated derivation of parametric data movement lower bounds for affine programs

https://doi.org/10.1145/3385412.3385989

Olivry, Auguste; Langou, Julien; Pouchet, Louis-Noël; Sadayappan, P.; Rastello, Fabrice (June 2020, 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation)

Full Text Available
Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable

https://doi.org/10.1145/3363785

Selva, Manuel; Gruber, Fabian; Sampaio, Diogo; Guillon, Christophe; Pouchet, Louis-Noël; Rastello, Fabrice (January 2020, ACM Transactions on Architecture and Code Optimization)

Full Text Available
Analytical cache modeling and tilesize optimization for tensor contractions

https://doi.org/10.1145/3295500.3356218

Li, Rui; Sukumaran-Rajam, Aravind; Veras, Richard; Low, Tze Meng; Rastello, Fabrice; Rountev, Atanas; Sadayappan, P. (November 2019, International Conference for High Performance Computing, Networking, Storage and Analysis)

Full Text Available
Data-flow/dependence profiling for structured transformations

https://doi.org/10.1145/3293883.3295737

Gruber, Fabian; Selva, Manuel; Sampaio, Diogo; Guillon, Christophe; Moynault, Antoine; Pouchet, Louis-Noël; Rastello, Fabrice (January 2019, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming)

Full Text Available

Search for: All records